Here are the slides I presented for a ClickHouse SF Bay Area Meetup in
July 2022, hosted by Altinity. They are about Akvorado, a
network flow collector and visualizer, and notably on how it relies on
ClickHouse, a column-oriented database.
Slides in PDF format
The meetup was recorded and available on YouTube. Here is the part
relevant to my presentation, with subtitles:1
I got a few questions about how to get information from the higher
layers, like HTTP. As my use case for Akvorado was at the network
edge, my answers were mostly negative. However, as sFlow is
extensible, when collecting flows from Linux servers instead, you
could embed additional data and they could be exported as well.
I also got a question about doing aggregation in a single table.
ClickHouse can aggregate automatically data using TTL. My answer for
not doing that is partial. There is another reason: the retention
periods of the various tables may overlap. For example, the main table
keeps data for 15 days, but even in these 15 days, if I do a query on
a 12-hour window, it is faster to use the flows_1m0s aggregated
table, unless I request something about ports and IP addresses.
To generate the subtitles, I have used Amazon
Transcribe, the speech-to-text solution from Amazon AWS.
Unfortunately, there is no en-FR language available, which would
have been useful for my terrible accent. While the
subtitles were 100% accurate when the host, Robert Hodge from
Altinity, was speaking, the success rate on my talk was quite
lower. I had to rewrite almost all sentences. However, using
speech-to-text is still useful to get the timings, as it is also
something requiring a lot of work to do manually. ↩︎
( 2
min )